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Chapter 1 

Linear Algrebra for Signals and Systems 
A Matrix times a Vector 



The tools and ideas from abstract algebra, linear algebra, and functional analysis can be extremely useful 
to signal processing and many other areas of engineering and science. Indeed, many of the most important 
ideas can be developed from the simple operator equation 

Ax=b (1.1) 

by considering it in a variety of ways. If x and b are vectors from the same or, perhaps, different vector 
spaces and A is an operator, there are three interesting questions that can be asked which provide a setting 
for a broad study. 

1. Given A and x , find b . The analysis or operator problem. 

2. Given A and b , find x . The inverse or control problem. 

3. Given x and b , find A . The synthesis or design problem. 

Much can be learned by studying each of these problems in some detail. We will generally look at the finite 
dimensional problem where (1.1) can easily be studied as a matrix multiplication, but try to indicate what 
the infinite dimensional case might be [28] [12] [30] [2] [31] [22] [20]. Other older systems theory books based on 
vector space methods are [25] [9], one applied to signal theory is [11], and to optimization, [18]. The case 
for periodically time varying linear systems is considered by Shenoy [27] and multiscale system theory in . 
A development of vector space ideas for the purpose of presenting wavelet representations is given in . An 
interesting idea of unconditional bases is given by Donoho . 

The ideas of similarity transformations, diagonalization, the eigenvalue problem, Jordon normal form, 
singular value decomposition, rank, range, domain, nullity, etc. will later be developed and interpreted in 
these notes. 

1.1 A Matrix Times a Vector 

In this section we consider the first problem posed in the introduction where A and x are given and we want 
to interpret and give structure to the calculation of b . Equation (1.1) has a variety of special cases. The 
matrix A may be square or may be rectangular. It may have full column or row rank or it may not. It 
may be symmetric or orthogonal or non-singular or many other characteristics which would have interesting 
properties as an operator. If we view the vectors as signals and the matrix as an operator, there are two 
interesting interpretations. 
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2 CHAPTER 1. LINEAR ALGREBRA FOR SIGNALS AND SYSTEMS: A 

MATRIX TIMES A VECTOR 

• The operation (1.1) is a change of basis or coordinates for a fixed signal. The signal stays the same, 
the basis changes. 

• The operation (1.1) alters the characteristics of the signal but within a fixed basis system. The basis 
stays the same, the signal changes. 

An example of the first would be the discrete Fourier transform (DFT) where one calculates frequency 
components of a signal which are coordinates in a frequency space for a given signal. An example of the 
second might be convolution where you are processing or filtering a signal and staying in the same space or 
coordinate system. 

A particularly powerful sequence of operations is to first change the basis for a signal, then process the 
signal in this new basis, and finally return to the original basis. For example, the discrete Fourier transform 
(DFT) of a signal is taken followed by setting some of the Fourier coefficients to zero followed by taking the 
inverse DFT. 



Chapter 2 

Change of Basis and Change to Signal 1 



2.1 Change of Basis 

The operation given in can be viewed as x being a signal vector and with b being a vector whose entries are 
inner products of x and the rows of A . In other words, the elements of b are the projection coefficients of 
x onto the coordinates given by the rows of A . The multiplication of a signal by this operator decomposes 
it and gives the coefficients of the decomposition. 

An alternative view has x being a set of weights so that b is a weighted sum of the columns of A . In 
other words, b will lie in the space spanned by the columns of A at a location determined by x . This 
view is a composition of a signal from a set of weights which could have been obtained from a previous 
decomposition. 

These two views of the operation as a decomposition of a signal or the recomposition of the signal to 
or from a different basis system are extremely valuable in signal analysis. The ideas of orthogonality, rank, 
adjoint, etc. are all important here. The dimensions of the domain and range of the operators may or may 
not be the same. The matrices may or may not be square and may or may not be of full rank [13]. 

A set of linearly independent vectors x n forms a basis for a vector space if every vector x in the space 
can be uniquely written 

^a n x n (2.1) 






and the dual basis vectors x n allow a simple inner product to calculate the expansion coefficients as 

a n = < x,x n > = x T x n (2.2) 

(2.1) can be written as a matrix operation 

Fa=x (2.3) 

where the columns of F are the basis vectors and the vector a has the expansion coefficients a n as entries. 
Equation (2.2) can also be written as a matrix operation 

Fx=a (2.4) 

which has the dual basis vectors as rows of F. From (2.3) and (2.4), we have 

FFx=x (2.5) 

Since this is true for all x, 

FF=I (2.6) 
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or 

F = F^ 1 (2.7) 

which states the dual basis vectors are the rows of the inverse of the matrix whose columns are the basis 
vectors (and vice versa). When the vector set is a basis, F is necessarily square and from (2.3) and (2.4), 
one can show 

FF=FF. (2.8) 

Because this system requires two basis sets, the expansion basis and the dual basis, it is called biorthogonal. 
If the basis vectors are not only independent but orthonormal, the basis set is its own dual and the inverse 
of F is simply its transpose. 

F = F T (2.9) 

When done in Hilbert spaces, this decomposition is sometimes called an abstract Fourier expansion. 

2.2 Frames and Tight Frames 

If a set of vectors spans a space but are not linearly independent, (2.1) still holds but it is no longer unique. 
The set of vectors is called a frame for the space [32] [6] [16] [8] and are redundant in the sense there are more 
than necessary for a basis. The finite dimensional matrix version of this case would have F in (2.3) with 
more columns than rows but with full row rank. The dual frame vectors are also not unique but a set can 
be found such that (2.4) and, therefore, (2.5) holds (but (2.8) does not). A set of dual frame vectors could 
be found by adding a set of arbitrary but independent rows to F until it is square, inverting it, then taking 
the first N columns to form F whose rows will be a set of dual frame vectors. This method of construction 
shows the non-uniqueness of the dual frame vectors. This non-uniqueness is often resolved by minimizing 
some other parameter of the system [8]. 

If the matrix operations are implementing a frame decomposition and the rows of F are orthonormal, 
F=F T and the vector set is called a tight frame [32][8]. If the frame vectors are normalized to ||xk|| = 1, 
the decomposition in (2.1) becomes 

x = -jYl < x ' Xn > Xn ( 2 - 10 ) 

n 

where the constant A is a measure of the redundancy of the expansion which has more expansion vectors 
than necessary [8]. 
The matrix form is 

x=-FF T x (2.11) 

A v ' 

where F has more columns than rows. Examples can be found in [4]. 

Frames and tight frames don't seem to be particularly useful in finite dimensions, but become important 
in infinite dimensional signal analysis, especially using the new idea of wavelet basis functions [8]. 

In an infinite dimensional vector space, if basis vectors are chosen such that all expansion converge very 
rapidly, the basis is called an unconditional basis and is near optimal for a wide class of signal representation 
and processing problems. This is discussed by Donoho in . 

Still another view of a matrix operator being a change of basis can be developed using the eigenvectors 
(or singular values) of an operator as the basis vectors. Then a signal can decomposed into its eigenvector 
components which are then simply multiplied by the scalar eigenvalues to accomplish the same task as a 
general matrix multiplication. This is an interesting idea but will not be developed here. 

2.3 Change of Signal 

If both x and b are considered to be signals in the same coordinate or basis system, the matrix operator 
A is generally square. It may or may not be of full rank and it may or may not have a variety of other 
properties, but both x and b are viewed in the same coordinate system. 



One method of understanding and generating matrices of this sort is to construct them as a product of first 
a decomposition operator, then a modification operator in the new basis system, followed by a recomposition 
operator. For example, one could first multiply a signal by the DFT operator which will change it into 
the frequency domain. One (or more) of the frequency coefficients could be removed (set to zero) and the 
remainder multiplied by the inverse DFT operator to give a signal back in the time domain but changed by 
having a frequency component removed. That is a form of signal filtering. 

It would be instructive for the reader to make sense out of the cryptic statement "the DFT diagonalizes 
the cyclic convolution matrix" to add to the ideas in this note. 
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Chapter 3 

General Solutions of Simultaneous 
Equations 1 



The second problem posed in the introduction is basically the solution of simultaneous linear equations which 
is fundamental to linear algebra and very important in diverse areas of applications in mathematics, numerical 
analysis, physical and social sciences, engineering, and business. Since a system of linear equations may be 
over or under determined in a variety of ways, or may be consistent but ill conditioned, a comprehensive 
theory turns out to be more complicated than it first appears. Indeed, there is a considerable literature on 
the subject of generalized inverses or pseudo- inverses. The careful statement and formulation of the general 
problem seems to have started with Moore [21] and Penrose [23][24] and has been developed by many others, 
especially in statistics. Because the generalized solution of simultaneous equations is defined in terms of 
minimization of an equation error, the techniques are useful in a wide variety of approximation problems. 

The ideas are presented here in terms of finite dimensions and using matrices. Many of the ideas extend 
to infinite dimensions using Banach and Hilbert spaces [33]. 

3.1 The Problem 

Given an m by n real matrix A and an m by 1 vector b, find the n by 1 vector x when 

Ax=b (3.1) 

If b does not lie in the range space of A (the space spanned by the columns of A), there is no exact solution 
to (3.1), therefore, an error is defined by 

e=Ax-b. (3.2) 

A generalized solution to (3.1) is considered to be an x that minimizes some norm of e, usually e T *e. 
If there is a non-zero solution of the homogeneous equation 

Ax=0, (3.3) 

then (3.1) has many generalized solutions in the sense that any particular solution of (3.1) plus an arbitrary 
scalar times any non-zero solution of (3.3) will have same error in (3.2) and, therefore, is also a generalized 
solution. 

Examination of the basic problem shows there are ten cases [17] to be considered. These depend on the 
shape of A , the rank r of A , and whether b is in the span of the columns of A . 

la: m = n = r: One solution with no error. 
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lb: to, = n > r, b G span{\}: Many solutions with no error. 
lc: to = n > r;, hnot G span{\}: Many solutions with error. 
2a: to > n = r, b € s/xm{A}: One solution with no error. 
2b: m > n = r, hnot G span{A}: One solution with error. 
2c: ?7i > n > r, b G span{\}: Many solutions with no error. 
2d: to > n > r, bnorj G spanjA}: Many solutions with error. 
3a: n > m = r: Many solutions with no error. 
3b: n > to > r, b G srjanjA}: Many solutions with no error. 
3c: n > m > r, hnot G span{\}: Many solutions with error. 

In addition to these classifications, the possible orthogonality of the columns or rows of the matrices gives 
special characteristics. 

There are several assumptions or side conditions that could be used in order to define a useful unique 
solution of (3.1). The side conditions used to define the Moore-Penrose pseudo-inverse are that the norm of 
e be minimized and, if there is ambiguity (several solutions with the same minimum error), the norm of x 
also be minimized. A useful alternative to minimizing the norm of x is to require certain entries in x to be 
zero or fixed to some non zero value (equality constraints). 

In addition to using side conditions to achieve a unique solution, side conditions are sometimes part of 
the original problem. One important case requires that certain of the equations be satisfied with no error 
and the approximation be achieved with the remaining equations. 

3.2 Moore-Penrose Pseudo-Inverse 

A unique generalized solution to (3.1) always exists such that the equation error e T *e and the norm of the 
solution x are minimized. This solution is denoted by 

x=A+b (3.4) 

where A + is called the Moore-Penrose inverse of A. 

Roger Penrose [23][24] showed that for all A, there exists a unique A + satisfying the four conditions: 

• AA+A=A 

• A+AA+=A+ 

• [AA+]*=AA+ 

• [A+A]*=A+A 

There is a large literature on this problem. Five useful books are [17] [1] [3] [5] [26]. The Moore-Penrose pseudo- 
inverse can be calculated in Matlab [7] by the pinv(A,tol) function which uses a singular value decomposition 
to calculate the inverse. There are a variety of other numerical methods given in the above references where 
each has some advantages and some disadvantages. 

3.3 Properties 

For cases 2a and 2b, the following n by n system of equations called the normal equations [1][17] have a 
unique minimum equation error solution. 

A T *Ax=A T *b (3.5) 

Solving these equations is often used in least squares approximation problems. For these two cases the 
pseudo-inverse is simply, 

A+=[A T *A] _1 A T *. (3.6) 



An equivalent definition [1] of the pseudo-inverse can be given in terms of a limit by 

A+ = HmrA T *A+<5 2 ir 1 A T * = UmA T * rAA T *+<5 2 ll _1 . (3.7) 

Some properties [1][5] are: 

• [A+] + =A 

. [A+r = [A*] + 

• [A*A] + =A+A*+ 

• A+ = 1/A for A / else A+ = 

• A+=[A*A]+A*=A*[AA*] + 

• A*=A*AA+=A+AA* 



It is informative to consider the range and null spaces [5] of A and A + 



R(A) = R (AA+) = R (AA*) 

R{A+) =R{A*) = R(A+A) = R{A*A) 

R(I- AA+) = N (AA+) = N (A*) = N (A+) = ^(A)" 1 

R{I-A+A) = N{A+A) = N(A) = ^(A*) 1 - 



3.4 Geometric interpretation and Least Squares Approximation 

A particularly useful application of the pseudo-inverse of a matrix is to various least squared error approxi- 
mations. A geometric view of the derivation of the normal equations can be helpful. If b does not lie in the 
range space of A, an error vector is defined as the difference between Ax and b. A geometric picture of this 
vector makes it clear that for the length of e to be minimum, it must be orthogonal to the space spanned 
by the columns of A. This means that A*e=0. If both sides of (3.1) are multiplied by A*, it is easy to see 
that the normal equations of (3.5) result in the error being orthogonal to the columns of A and, therefore 
its being minimal length. If b does lie in the range space of A, the solution of the normal equations gives 
the exact solution of (3.1) with no error. 

For cases lb, lc, 2c, 2d, 3a, 3b, and 3c the homogeneous equation (3.3) has non-zero solutions. Any 
vector in the space spanned by these solutions (the null space of A) does not contribute to the error e 
defined in (3.2) and, therefore, can be added to any particular generalized solution of (3.1) to give a family 
of solutions with the same approximation error. If the dimension of the null space of A is d, it is possible 
to find a unique generalized solution of (3.1) with d zero elements. The non-unique solution for these four 
cases can be written in the form [3] 

x=A+b+ [I-A+A] y (3.8) 

where y is an arbitrary vector. The first term is the minimum norm solution given by the Moore-Penrose 
pseudo-inverse A + and the second is a contribution in the null space of A. 

3.5 Least Squares with Constraints 

The solution of the overdetermined simultaneous equations is generally a least squared error approximation 
problem. A particularly interesting and useful variation on this problem adds inequality and/or equality 
constraints. This formulation has proven very powerful in solving the constrained least squares approximation 
part of FIR filter design [14]. The equality constraints can be taken into account by using Lagrange multipliers 
and the inequality constraints can use the Kuhn- Tucker conditions [10] [29] [19]. 
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Chapter 4 

Constructing the Operator (Design) 1 



Solving the third problem posed in the introduction to these notes is rather different from the other two. Here 
we want to find an operator or matrix that when multiplied by x gives b . Clearly a solution to this problem 
would not be unique as stated. In order to pose a better defined problem, we generally give a set or family 
of inputs x and the corresponding outputs b . If these families are independent, and if the number of them 
is the same as the size of the matrix, a unique matrix is defined and can be found by solving simultaneous 
equations. If a smaller number is given, the remaining degrees of freedom can be used to satisfy some other 
criterion. If a larger number is given, there is probably no exact solution and some approximation will be 
necessary. 

Alternatively, the matrix may be constrained by structure to have less than N 2 degrees of freedom. It 
may be a cyclic convolution, a non cyclic convolution, a Toeplitz, a Hankel, or a Toeplitz plus Hankel matrix. 

This problem came up in research on designing efficient prime length fast Fourier transform (FFT) 
algorithms where x is the data and b is the FFT of x . The problem was to derive an operator that would 
make this calculation using the least amount of arithmetic. We solved it using a special formulation [15] and 
Matlab. 
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Linear algebra, vector space methods, and functional analysis are a powerful setting for many topics in 
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